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Specification 
Title of the invention 

System for simultaneously processing plural instructions 
Claims 

A system for simultaneously processing plural 
instructions in a multistage pipeline computer 
comprising an instruction reading unit for 
simultaneously reading multiple instructions from a main 
storage for storing instructions and operands, plural 
decoding units for interpreting said read instructions 
and identifying the types of said instructions and 
operands , plural operand reading units for reading 
required operands from said main storage or a general- 
purpose register file according to the results given by 
said decoding units, plural processing units for 
executing operations on said read operands according to 
the types of said instructions, wherein said decoding 
units comprise a decoding means for simultaneously 
decoding plural instructions and a determining means for 
determining whether said decoded instructions can be 
processed in parallel, and a means for coupling said 
instructions which can be processed in parallel, and 
wherein said plural instructions coupled are processed 
in pipelines synchronously between said plural operand 
reading means and plural processing units. 

The system for simultaneously processing plural 
instructions according to Claim 1 , characterized by the 
fact that said system comprises at least one status 
register for indicating the operation result status of 
said operations, wherein the operation result statuses 
of said plural instructions which are coupled and 
executed are reflected on said status register according 
to the sequence of said instructions. 

The system for simultaneously processing plural 
instructions according to Claim 1, characterized by the 
fact that said system comprises at least one status 
register for indicating the operation result status and 
a means for transferring the operation result statuses 
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between plural pipelines, wherein it is determined 
whether a conditional branch instruction is executed 
according to a selection result between the operation 
result status which was transferred from another 
pipeline and the content of said status register. 

4. The system for simultaneously processing plural 
instructions according to Claim 1, characterized by the 
fact that said system further comprises a means for 
transferring the operation results between plural 
pipelines, wherein the transferred operation results are 
used to execute the operations of other instructions. 

5. The system for simultaneously processing plural 
instructions according to Claim 1 , characterized by the 
fact that said system is performed in plural pipeline 
processing units consisting of the same logic hardware. 

6. The system for simultaneously processing plural 
instructions according to Claim 1, characterized by the 
fact that said system further comprises a general- 
purpose register file shared by plural pipeline 
processing units and having plural read/write ports and 
a cache memory having plural read/write ports so that 
simultaneous processing is performed. 

7. A system for simultaneously processing plural 
instructions in a multistage pipeline computer 
comprising an instruction reading unit for reading 
plural variable-length instructions from a main storage 
for storing variable-length instructions and operands, 
plural decoding units for interpreting said read 
instructions and identifying the types of instructions 
and the types of operands, plural operand reading units 
for reading required operands based on the decoding 
results from said decoding units, plural processing 
units for executing operations on said operands 
according to the types of the instructions, 
characterized by the fact that said decoding units 
comprise plural decoders consisted of the same number of 
bits as a minimum unit of variable-length instructions , 
and plural instructions which are read from said main 
storage are simultaneously decoded by using said 
decoders to simultaneously decode each minimum unit of 
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an instruction and then using the decoded results to 
identify the head of the instruction. 

3. Detail explanation of the invention 

[Scope of the invention] 

The present invention relates to a processing unit 
which executes instructions sequentially, and 
componenticularly to a processing unit which executes 
plural instructions simultaneously • More precisely, the 
present invention relates to an architecture for 
executing plural instructions simultaneously while 
maintaining the succession of instructions in a 
processing unit consisting of plural pipelines. 

[Prior art technology] 

In the prior art, high performance of a general-purpose 
computer has been realized by multistage pipelining. 
Processes for executing an instruction, such as 
instruction fetching, decoding, operand address 
calculation, operand fetching, operation, are considered 
to be distinct stages. Different instructions are 
executed at each stage so that high performance is 
realized. It is assumed in a multistage pipeline system 
that instructions are executed sequentially. In other 
words, instructions are executed* sequentially based on a 
program counter and their execution sequence is never 
changed unless program control instructions such as 
branch instructions are executed. Instructions are 
executed in a conventional general-purpose computer, 
assuming the succession of instruction execution is 
maintained. 

On the other hand, attempts have been made to speed up 
the process by executing plural instructions in parallel 
for a long time. These include, for instance, CDC5500 
described in ''Parallel Operation in the Control Data 
6600", Proc. of Spring Joint Computer Conference, 1964 
and, more recently. Motorola's MC88100 described in 
"Supercomputing on Chip" VLSI System Design May 1988, 
pp24 - 33. Similar architecture can be seen in "Multi 
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execution unit^ uni-processor system" described in 
Patent Publication No. SH062-262 142 . 

In CDC500 and MC88100, fixed-point and floating-point 
calculations are restricted to data between general- 
purpose registers. Data transfer between general- 
purpose registers and the main memory is executed by 
dedicated load/store instructions. Plural processing 
units are provided and they can operate independently. 
This data structure allows the parallel execution of a 
transfer instruction between the main memory and 
general-purpose registers and an operation instruction, 
as well as plural operation instructions. This 
architecture executes transfer instructions and 
operation instructions asynchronously. Asynchronous 
execution of these instructions is useful for deriving 
potential parallel execution in a program. However, 
there are some problems , too . 

The first problem is that a complex control mechanism 
may be required to maintain the instruction sequence. 
For executing an operation on a datum, the datum in 
question is transferred from main memory to a general- 
purpose register and the operation is executed on the 
datum in the general-purpose register, the result being 
transferred from the general-purpose register to the 
main memory. This process is executed with three 
instructions, i.e. load, operation, and store 
instructions and must be executed sequentially. However, 
the sequence of the instructions is not assured if they 
are executed asynchronously. Therefore, the CDC5500 
architecture uses a scoreboard system to maintain the 
sequence of instructions . The scoreboard system employs 
flags for exclusive control of each entry in the 
general-purpose register, known as a scoreboard bit. 
When an instruction is decoded, the flag in the general- 
purpose register which includes the operand of the 
instruction is turned on. The flag is cleared when the 
execution of the instruction is completed. An 
instruction which attempts to access a register whose 
flag is ON is blocked and kept waiting until the flag is 
turned OFF. . This ensures maintaining the sequence of 
instructions described above. In the "plural execution 
unit, uni processor system" described in Patent 
Publication No. SHO 62-263142, plural processing units 
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can be provided to execute plural instructions 
asynchronously. Its architecture comprises a more 
extensive exclusive control mechanism for general- 
purpose registers in order to ensure the sequence of 
instructions. This exclusive control mechanism for 
general-purpose registers increases hardware complexity 
and may cause a reduction in the performance of 
operation processing. Conventional pipeline systems 
ensures the sequence of instructions when a series of 
sequential processes, such as load, operation, and store, 
are executed. Therefore, it is possible to simplify 
exclusive control for general-purpose resisters and to 
transfer loaded data directly to operation instructions 
without passing through general-purpose registers . On 
the other hand, the CDC5500 architecture stores the 
loaded data in general-purpose registers and operation 
instructions are held until the score board flag is 
cleared. This causes a delay in transferring data 
between the instructions due to the exclusive control 
overhead . 

The second problem accompanying asynchronous execution 
of instructions complicates the status control of the 
processing units. Conventional single pipeline systems 
do not change the execution sequence of instructions. 
The status of processing units changes according to the 
order of instructions, reflecting the status registers. 
As long as this is assured, status control is easy. For 
instance, it assures that the execution results of the 
preceding instructions are stored in status registers 
before a condition is determined for a conditional 
branch instruction which transfers control according to 
the status of the processing units. For interruptions, 
the status of the processing units at the time of 
interruption can be easily determined. It is also easy 
to reproduce the status of processing units when 
interruption occurs after some procedures are executed 
for n interrupt request. 

Conversely, when instructions are executed 
asynchronously, there is no guarantee that the 
instructions are executed sequentially^ complicating the 
istatus management of processing units. For instance, if 
an instructions for generating a condition and a 
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conditional branch instruction are asynchronously 
executed, there is no assurance that the execution 
result to generate the condition is reflected in the 
status register before the conditional branch 
instruction is executed. To solve this problem, the 
Motorola's MC88100 uses a system for transferring the 
execution result status of an instruction to generate a 
condition to a conditional branch instruction via a 
general-purpose register. This system allows 
synchronizing an instruction to generate a condition 
with a conditional branch instruction by means of the 
exclusive control mechanism of general-purpose registers. 
However, this system requires conditional branch 
instructions to have a format in which they can specify 
a general-purpose register with its operand. This 
system is unavailable in a computer which executes the 
instruction set without this format. 

[Problems overcome by the invention] 

As is described above, a conventional system for 
simultaneously processing plural instructions wherein 
instructions are executed asynchronously using plural 
different execution units has the problem that a complex 
exclusive control mechanism is required for general- 
purpose registers in order to ensure the essentially 
sequential processing of instructions. In addition, the 
exclusive control mechanism may cause a reduction in 
performance due to the data transfer overhead among 
instructions. Furthermore, the management of the status 
registers of processing units is complicated because 
asynchronously execution of instructions does not ensure 
their sequence. 

Patent Publication No. SH062-65133 is known for a means 
for executing plural instructions simultaneously within 
the time for executing an instruction. However, no 
description is provided on how plural instructions are 
to be executed. 

A purpose of the present invention is to provide a 
processing unit and a system for simultaneously 
processing plural instructions, which enables 
simultaneous processing of plural instructions while 
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maintaining their succession according to the sequence 
of instructions stated in a program. 

Another purpose of the present instruction is to provide 
a system for simultaneously processing plural 
instructions, comprising at least one status register 
for plural processing units, said status register being 
updated according to the sequence of instructions stated 
in a program. 

Another purpose of the present invention is to provide a 
system for simultaneously processing instructions, 
wherein the general-purpose register file does not 
require an exclusive control mechanism. 

Another purpose of the present invention is to provide a 
system for simultaneously processing instructions, 
comprising repetitive hardware structures having the 
same logic. 

[Problem resolution means] 

The above purposes are attained by configuring a 
processor using plural pipelines consisting of hardware 
of the same logic and by a system for simultaneously 
processing plural instructions, comprising a means for 
decoding plural instructions simultaneously, a means for 
determining whether the decoded instructions can be 
executed in parallel and, if so, coupling these 
instructions, and a means for always executing the 
coupled instructions synchronously in plural pipeline 
processing units. A processor consisting of plural 
pipelines is provided in the sole status register which 
indicates an operation result status. The operation 
result statuses of the plural inistructions coupled are 
combined according to the instruction sequence in a 
program, and simultaneously reflected in the status 
register. The register file and cache memory are shared 
by plural pipelines, having plural read/write ports for 
providing operands simultaneously for each pipeline. 



[Efficacy] 
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The decoder decodes, picks up, and analyses plural 
instructions simultaneously, regardless of whether they 
are fixed-length or variable-length instructions. It is 
determined that the plural instructions picked up can be 
executed in parallel by analyzing the operand 
combination or comparing the instruction types. If the 
parallel execution is available, these instructions are 
coupled and executed using plural pipelines. In this 
case, the plural instructions coupled are executed 
synchronously. In other words, these plural 
instructions are present at the same stage of each 
pipeline regardless the complexity of the instruction. 
The operation result statuses in the pipelines are 
merged according to the instruction order and reflected 
in the sole status register. Thus, plural instructions 
can be executed without their sequence being changed 
while maintaining their succession. Ensuring the 
succession of instructions allows simplification of the 
exclusive control of the general-purpose registers. It 
is also ensured that the sole status register is updated 
according to the instruction sequence, facilitating the 
status management of processing units for conditional 
branch instructions and interruptions. 

[ Embodiments ] 

An embodiment of the present invention is described 
hereinafter, with reference to the drawings. Fig. 2 
shows an example of a computer system to which the 
present invention is applied. Cluster computers 100, 
110, and 120 are connected to a global memory 130 at 
global memory ports 131, 132, and 133. The cluster 
computers share the global memory 130 which is 
duplicated for high reliability. Each cluster computer 
is connected to magnetic disks 141, 142 or terminals 143, 
144 via an I/O switching network 140. In the cluster 
computer 100, processing units 103, 104, 105, and 106 
are connected to a common memory 101 via a common bus 
102 and a memory port 10 8. In the common memory 101 are 
stored programs and data necessary for the processing 
units. An input/output port 107 is used for the 
processing units to access input/output devices such as 
the magnetic disks 141 and 142. 
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An internal architecture of the processing unit 103 is 
described hereafter, with reference to Fig. 3. An 
instruction cache memory 230 temporally holds the 
instructions to be executed in the processing unit 103, 
An instruction fetch unit 200 reads the instructions 
from the instruction cache memory 230 and transfers them 
to the instruction execute unit 210. Logical addresses 
201 which are output from the instruction fetch unit 2 00 
are converted to physical addresses by an instruction 
address convert buffer 220 and supplied to the 
instruction cache memory 230. Instructions read from 
the instruction cache memory 230 are supplied to the 
instruction fetch unit 200 via a bus 2 02. Other 
functions of the instruction fetch unit 200 include the 
control of instruction fetch direction, for which it has 
an internal branch predict buffer. When the instruction 
fetch unit 200 detects a branch instruction among the 
fetched instructions, it accesses the branch predict 
buffer to determine the instruction address to branch. 
An operand cache memory 250 temporarily holds operands 
accessed by an instruction execute unity 210. An 
operand address convert buffer 24 0 converts logical 
addresses 203 which are output from the instruction 
execute unit 210 to physical addresses and transfers 
them to the operand cache memory 250. The instruction 
execute unit receives instructions from the instruction 
fetch unit 20 0, decodes them, and performs operand 
address calculation, operand fetch, and operation 
according to the decoding results . A common bus monitor 
2 80 monitors transactions on a common bus 102 and, if 
necessary, cancels or updates the operand cache memory 
250. This ensures the consistency of the operand cache 
memories provided in plural processing units. 

The instruction fetch unit 2 00 is described next, with 
reference to Fig. 4. A fetch pointer 300 holds the 
addresses of instructions to fetch. A selector 302 
selects an adder 301 and the fetch pointer is increased 
by a certain increment as far as instructions are 
fetched seguentially . In this embodiment, 16 byte 
length data is read per each instruction fetch and 
accordingly the fetch pointer 300 is increased by 16. 
If a branch instruction is included in the fetched 
instructions, the selector 302 selects a branch address 
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304 which is sent from the branch predict buffer 330 or 
the instruction execute unit, and sets the fetch pointer 
300 at the branch address. An instruction is read from 
the instruction cache memory 230 according to the 
address in the fetch pointer 300 and stored in the 
instruction buffer 310. The instruction buffer 310 is a 
first-in first-out buffer and is assumed to have eight 
16-byte entries. The instruction buffer 310 has a read 
address register 312. The read address register 312 
indicates any byte position of the instruction buffer 
310. An aligner 311 reads 18-byte information from the 
indicated byte position and sends it to a decoder 314. 
An instruction pick up component 315 informs an adder 
313 of the size of the picked up instruction and a new 
value is determined for of a read address register 312. 
The decoder component 314 decodes the 16-byte 
information read from the instruction buffer using 
plural decoders having the same number of bits as the 
minimum unit of instructions. Here, the minimum unit of 
instructions is 2 bytes and accordingly 16-byte 
information is decoded simultaneously using eight 
decoders for two bytes each. The analysis results from 
these eight decoders are transferred to an instruction 
pick up 315. The instruction pick up 315 picks up a 
first instruction 319 and determine the size 316 of the 
first instruction as well as picks up a second 
instruction 325 and determines the size 326 of the 
second instruction size 326 according to the information 
given by the decoder. In this embodiment, two 
instructions are simultaneously picked up. However, 
more than two instructions can be simultaneously picked 
up. With the decoding system described above, plural 
variable length instructions can be simultaneously 
picked up. The picked up first and second instructions 
319 and 325 and their size information 316 and 317 are 
simultaneously stored in the execute unit instruction 
buffer 340. 

On the other hand, a program counter 320 holds the 
address in the main memory of the first instruction 
picked up by the decoder 314. An adder 323 is used to 
calculate the address of the second instruction^ which 
adds the size 316 of the first instruction to the 
program counter 320. The addresses in the main memory 
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included, it sends the address of the branch instruction 
to the branch predict buffer 330 by controlling the 
selector 324 based on which instruction it is^ the first 
or second. The branch predict buffer selects . a specific 
entry of the branch predict buffer using the address sent 
and compares it with the compare address tag 332 using 
the comparator 336 to determine whether the branch 
instruction in question has been registered. If the 
branch instruction is registered and the branch predict 
bit 333 indicates a branch, the branch address is set in 
the program counter 320 and fetch pointer 300. At this 
time, the instruction buffer 310 is entirely cleared. 
Then, the branched instruction 335 is stored in the 
instruction buffer 310 using the selector 303. On the 
other hand, no operation is performed if the branch 
predict bit 333 indicates no branch. 

The instruction execute unit 210 is described in detail 
hereafter, with reference to Fig. 5. The embodiment in 
Fig. 5 is configured to execute two instructions 
simultaneously, and can be easily configured to execute 
more than two instructions simultaneously, as well. An 
execute unit instruction buffer 34 0 reads two 
instructions simultaneously and decoders 400 and 401 
determine the instruction type and operand type. In this 
instance, it is assumed that the decoder 400 decodes an 
instruction to be executed first (first instruction) and 
the decoder 4 01 decodes an instruction to be executed 
next (second instruction). The information is sent to an 
instruction coupling checker 402. The instruction 
coupling checker verifies the instruction type and the 
operand conflict and determines whether it is possible to 
couple the two instructions read from the execute unit 
instructibn buffer 340. Possible coupling of instruction 
types are shown in Fig. 7. Almost all instruction pairs 
can be coupled except for bit field and decimal operation 
instructions not coupled to other instructions. In 
addition, branch instructions and subroutine link 
instructions cannot be coupled with themselves.. 
Furthermore, two instructions cannot be coupled when the 
destination operand of one instruction is equal to the 
source operand of the other instruction. 

For the instruction pair which is determined to be 
coupled by the instruction coupling checker 402^ the 
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following pipeline stages such as address calculation, 
operand fetch/ and operation are synchronously executed. 
For the pair determined not to be coupled, only the first 
instruction is given to the following stages. The 
remaining instruction is treated as the first instruction 
in the following- pair and decoded and determined together 
with the following instruction whether they can be 
coupled. 

The decoding results of the coupled instructions are set 
in groups of registers 410 to 414 for the first 
instruction and 415 to 419 for the second instruction. 
The register operand address register 410 and 411 store 
the register addresses of source and destination operands 
of the first instruction. The registers 418 and 419 
serve the same function for the second instruction. When 
the first instruction includes a memory operand, the 
register address of the base register is stored in the 
register 414, the register address of the index register 
in the register 413, and the displacement information in 
the register 412. The registers 415, 416, and 417 serve 
the same function for the second instruction. 

The process of the address calculation stage is described. 
When the first instruction includes a memory operand, its 
logical address has to be calculated. The address of the 
memory operand is calculated using an adder 421, which 
adds the contents of the base register in the address 
register file 420 indicated by the register 414, the 
contents of the index register in the address register 
file 420 indicated by the register 413, and the 
displacement inf oinnation 412. The calculated address is 
stored in a logical address register 425. The same 
procedure is performed for the second instruction and the 
calculated logical address is stored in a register 42 6. 
The address register file 420 is shared by the first and 
second instructions, having plural read ports to allow 
simultaneous address calculation for the first and second 
instructions . 

The memory operand read stage is described. When the 
first operand includes a memory operand, the memory is 
accessed using the logical address 425 obtained in the 
address calculation stage. The logical address 425 is 
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converted to a physical address in the main memory by an 
operand address convert buffer 430. Using the physical 
address, an operand cache memory 431 is accessed and the 
read memory operand is stored in a register 434. In the 
same way, the memory operand is stored in a register 435 
for the second instruction. The operand address convert 
buffer 430 and the operand cache memory 431 are shared by 
the first and second instruction, having plural read port 
to allow the simultaneous operand reading for the first 
and second instructions. It is noted here that, for 
instance, if a cache miss occurs in the first instruction, 
it cannot proceed to the following stage, the second 
instruction can not proceed to the following stage. 

The configuration of operand fetch stage is described. 
If the first instruction has a register operand, the 
operand is read from a data register file 440 according 
to the information in register operand address registers 
432 and 433. On the other hand, if it is a memory 
operand, the operand is obtained from a register 434 via 
an aligner 441. 

If the source operand is the result of an instruction 
executed at in the immediately preceding step in the same 
pipeline, the operand is obtained from an intra-pipe 
bypass rout 4 60. On the other hand, the source operand 
is the result of instructions executed in the immediately 
preceding step in another pipeline, and the operand is 
obtained from an inter-pipe bypass route 461. The same 
procedures are performed for the second instruction. 

Processing units 454 and 455 execute operations on the 
operands obtained at the operand fetch stage and the 
results are stored in registers 456 and 457. Following 
this, the operation results are stored in the address 
register file 420, data register file 440, or operand 
cache memory 431. The operation results statuses 462 and 
463 (zero, overflow, and so on) of the processing units 
454 and 455 are transferred to a status code generation 
circuit 458 and reflected on a status register 459. 

The status code generation circuit is described hereafter, 
with reference to Fig . 5 . The status code generation 
circuit 458 has two functions. The first function is to 
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merge the operation result statuses of the first and 
second instructions, considering the instruction sequence 
and reflects it on the status register. The second 
function is to determine the condition for simultaneously 
processing a conditional branch instruction and an 
instruction to generate the condition. In the first 
function, the operation result status 4 62 which is output 
from the processing unit 454 for the first instruction 
and the operation result status 463 which is output from 
the processing unit 455 for the second instruction are 
input into a status generation component 916. 
Considering that the first instruction should be executed 
before the second instruction, the status generation 
component 916 reflects the status from the second 
instruction on the status 462 from the first instruction 
and stores it in the status register 459. 



The second function is as follows. Assuming that a 
conditional branch instruction is executed in the 
pipeline for the second instruction, the branch condition 
determining information is stored in a register 904 and 
prediction of the results by the branch predict buffer is 
stored in a register 905. Assuming that an instruction 
to generate a branch condition and a conditional branch 
instruction are sequentially executed, the branch 
condition is already reflected on the status register 
when the conditional branch is executed. Therefore, a 
selector 914 selects the status register 459 and inputs 
it in a branch determining circuit 915. The branch 
determining circuit 915 determines whether the branch 
should be perfojnned using the status register 459 and the 
branch condition determining information 9 04, the result 
of which is compared with the branch prediction results 
905 using a comparator 916. If it is found to be 
consistent, no operation is performed. If not consistent, 
all of the pipelines are canceled to branch correctly. 
Now, a case is considered in which an instruction for 
generating a condition and a conditional branch 
instruction are executed simultaneously. it is assumed 
that the first instruction is an instruction for 
generating a condition and the' second instruction is a 
conditional branch instruction. In this case, the branch 
condition is not reflected on the status register 469 
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when the conditional branch instruction is executed. 
Therefore, the selector 914 selects the operation result 
status 462 of the first instruction and inputs it into 
the branch determining circuit 915. The branch 
determining circuit 915 determines whether the branch 
should be perfoinned using the operation results status 
462 of the first instruction and the branch condition 
determining information 904, the results of which are 
compared with the branch predict result 905 using the 
comparator 916. if it is found to be consistent, no 
operation is performed. If not consistent, all of the 
pipelines are canceled to branch correctly. 

The configuration of the status code generation circuit 
456 described above ensures that the status register 459 
is updated according to the instruction sequence and 
allows the simultaneous execution of a conditional branch 
instruction and an instruction for generating a branch 
condition . 

The operation of the pipelines is described hereafter, 
with reference to Figs. 1, 8, and 9. Fig.l shows the 
pipeline configuration of an embodiment of the present 
invention which realizes the simultaneous processing of 
two instructions . Plural instructions are read from an 
instruction cache memory 52 0 at the instruction fetch 
stage 500. The plural instructions are simultaneously 
picked up by the pre-decode stage 501. If a branch 
instruction is included, the branch direction is 
determined by accessing a branch predict buffer 521. 
Instructions are read from an execute unit instruction 
buffer in the instruction buffer state 502 . Two 
instructions are simultaneously decoded and it is 
determined whether they can be coupled at the decode and 
combine stage 503. The logical addresses of the memory 
operands are calculated at the address calculation stages 
504 and 511 and they are converted to logical addresses 
at the address convert stages 506 and 512. The operands 
are read from an operand cache memory 523 or a register 
file 522 at the operand fetch stages 505 and 515. 
Operations are performed on the read operands by the 
operation stage 509 and 516. The operation results are 
stored in the operand cache memory 523 or register file 
522 by the write stage 510 and 517. After the address 
calculation stage, two pipelines have the same logic. 



<H2-130635).rtf 18 



The coupled instructions at the decode and combine stage 
503 are synchronously executed in two pipelines. 

Fig. 8 shows the pipeline stage flowchart in which the 
simultaneous processing of two instructions is 
effectively performed. in the figure, simultaneous 
processing of the third and fourth instructions is 
performed by transferring the operation result between 
the pipelines, as described above. Simultaneous 
processing of the ninth and tenth instructions is 
performed when the branch is successfully predicted for 
the subroutine jump instruction of the ninth instruction. 

Fig. 9 shows another a pipeline stage flowchart for the 
simultaneous processing of two instructions. This shows 
an example in which the third and fourth instructions 
cannot be coupled because of a d7 register conflict. In 
this case, only the third instruction is executed 
independently and the fourth instruction is coupled with 
the fifth instruction to execute them. The eighth and 
ninth instructions are successfully coupled. However, 
the ao register conflicts between the seventh and eighth 
instructions, so that the eighth instruction must wait. 
In this case, the ninth instruction which is coupled to 
the eighth also must wait. Thus, synchronous pipelines 
ensure that the sequence of instructions is maintained. 

[Efficacy of the invention] 

The present invention enables simultaneous processing of 
P}""1 instructions while maintaining their sequence, 
simplifying the exclusive control of general-purpose 
register file and allowing high performance. This 
ensures that the sole status register is updated 
according to the instruction sequence, facilitating the 
status management of processing units. 

4. Brief description of the drawings 

Fig.l is a schematic presentation showing the pipeline 
configuration of an embodiment of the present invention. 
Fig. 2 a schematic presentation showing a computer 
system to which the present invention is applied. Fig. 3 
shows the internal configuration of the processing unit 
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in Fig. 2. Fig. 4 shows the internal configuration of the 
instruction fetch unit in Fig. 3. Fig. 5 shows the 
internal configuration of the instruction execute unit 
in Fig. 3. Fig. 6 shows the internal configuration of the 
status code generation circuit in Fig. 5. Fig. 7 shows 
the possible coupling of instructions. Figs. 8 and 9 
shows examples of the pipeline stage flowchart. 

5 00 ... instruction fetch stage ^ 

503 ... decode and combine stage, 

504, 511 ... address calculation stage, 

508, 515 ... operand fetch stage, 

509, 516 ... operation stage, 

522 ... multi-port register file, 

523 ... operand cache . 

Agent Patent Attorney Katsuo Ogawa 
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